Lecture 5 - Choice Models
CIVE 461/861: Urban Transportation Planning
Outline
- Basis for choice modeling
- Theory
- Logit choice model
Basis for Choice Modeling
- Aggregate demand models (such as those we have been discussing) based on observations for groups of travelers or average relations for zones
- Discrete choice: probability an individual chooses a given option a function of their socioeconomic characteristics & relative attractiveness of the option
- Cannot estimate via least squares because probability is unobserved β only observe choice
- Models are probabilistic & do not give an outcome value such as trips β must use probability concepts to generate discrete outcomes (summation over all observations to give expected number choosing option or use sample enumeration to allocate discrete outcomes)
Basis for Choice Modeling
- Individuals derive utility from the use of alternatives defined by a utility function often denoted by V \[Vpeanut butter = 0.25 β 1.0 ALLERGY - 0.5 PRICE + 0.25 ORGANIC\]
- Need probability value between 0 & 1. Various forms, generally with an s-shaped plot. Main ones are \[πΏππππ‘:π_1=expβ‘(π_1 )/(expβ‘(π1)+expβ‘(π2))\] \[ππππππ‘:π_1=\int_{-\infty}^{\infty}β‘\int_{-\infty}^{V_1-V_2+x_1}\frac{β‘expβ‘(\frac{β1}{2(1β\rho^2)} (\frac{π₯_1}{\sigma_1})^2β\frac{2\rho π₯_1 π₯_2}{\sigma_1 \sigma_2}+(\frac{π₯_2}{\sigma_2})^2}{2\pi\sigma_1 \sigma_2 \sqrt{1β\rho^2 }} ππ₯_2 ππ₯_1\]
Properties of Discrete Choice (DC)
- Based on theories of individual behavior & do not constitute physical analogies β therefore stable over time & space
- Can be more statistically efficient than aggregate models because each observation is an outcome rather than using aggregations of potentially hundreds of observations
- DM less likely to suffer from bias due to correlation between aggregate units β i.e., ecological correlation (fallacy)
Ecological Fallacy
Theoretical Framework
- Individuals belong to a homogeneous population \(N\), act rationally, & possess perfect information β i.e., they always select the option that maximizes their net personal utility subject to legal, social, physical, &/or budgetary (both time & money) constraints
- There is a certain set \(π΄ = \{π΄_1, β¦π΄_i,β¦π΄_I\}\) of available alternatives & a set X of vectors of measured attributes of individuals & their alternatives
- An individual \(n\) is endowed with a set of attributes \(π₯ \in π\) & faces a choice set \(π΄(n) \in π΄\)
- We will assume choice set is predetermined for individuals
Theoretical Framework
- Modeler, an observer, does not posses complete information about all elements considered by the individual making choice
- Modeler assumes \(π_{ni}\) can be represented by two components:
- A measurable, systematic, or representative part \(π_{ni}\), which is a function of the measured attributes \(x\)
- A random part \(\epsilon_{ni}\), which reflects indiosyncratic & taste variation, together with measurement or observational errors made by the modeler
Theoretical Framework
- Modeler sets utility \(π_{ni}=π_{ni}+\epsilon_{ni}\) to address two apparent irrationalities
- That two individuals faced with the same situation & having the same personal attributes make different choices
- That an individual may not select the alternative that appears to be the best one
- Often assumed \(V_{ni}\) given by \(\sum_k^K \beta_{ik} x_{nik}\) here \(\beta_{ik}\) are assumed constant for all individuals \(N\) in the homogeneous set but may vary across alternatives \(I\)
Theoretical Framework
- Individual chooses an alternative based on \(π_{ni}β₯ π_{jn} \forall A_j \in A(n)\)
- Or: \(π_{ni} β π_{nj}β₯\epsilon_{nj} β \epsilon_{ni}\) (notice swapping of \(i\) & \(j\))
Not possible to know with certainty if the inequality holds, so use \[π_{ni} = π(\epsilon_{nj} \leq \epsilon_{ni} + (V_{ni} - V_{nj})) \forall A_j \in A(n)\] - Do not know distribution of joint residuals, so cannot derive analytical expression for model - Can denote \(f(\epsilon) = f(\epsilon_1, \epsilon_2, ... \epsilon_I\) & know they are random variables with a certain distribution
Theoretical Framework
- Can note that distribution for \(f(U)\) is the same but with mean \(V\) rather than 0 \[P_{ni} = \int_{R_I} f(\epsilon) d\epsilon\] \[\text{where } R_I = \begin{cases}
\epsilon_{nj} \leq \epsilon_{ni} + (V_{ni} - V_{nj}), & \forall A_j \in A(n) \\
V_{ni}) + \epsilon_{ni} \ge 0 \\
\end{cases}\]
- Assuming independent & identically distributed (IID) residuals, \[f(\epsilon_1,...\epsilon_I) = \prod_i^I g(\epsilon_i)\]
- And (drop \(n\) for simplicity) \[P_i = \int_{-\infty}^\infty g(\epsilon_i) d\epsilon_i \prod_{j \neq i} \int_{-\infty}^{V_i-V_j+\epsilon_i} g(\epsilon_j) d\epsilon_j\]
Generating the Logit Model
- Assuming IID Gumbel (Extreme Value Type I) residuals, or equivalently that residual differences are logistically distributed, gives the standard multinomial logit specification
- Termed softmax function in machine learning field \[P_{ni} = \frac{exp(\mu V_{ni})}{\sum_{A_j \in A(n)} exp(\mu V_{nj})}\]
- Where \(\mu\) is a scale parameter assumed equal to 1.0 for identification
![]()
Logit Model Parameter Estimation
- Commercial (& opensource) software exists to estimate model parameters using the method of maximum likelihood estimation
- Apollo (R & free)
- Biogeme (Python & free)
- STATA (commercial)
- LIMDEP (commercial)
- GAUSS (commercial)
- Parameter interpretation similar to regression (t-statistics, GoF, etc.)
Direct & Cross Elasticity
- Direct elasticity: direct effect of changing a variable value related to a good on demand for the same good
- E.g., elasticity of transit demand wrt transit fare, transit travel time, or transit headway
- Cross elasticity: effect of changing a variable value related to a good on demand for a different good
- E.g., elasticity of transit demand wrt auto travel time
Logit - Direct Elasticity
Direct elasticity of the probability of an individual \(n\) choosing alternative \(i\) wrt a change in an attribute/independent variable \(X_{ik}\) with coefficient \(\beta_k\) (ignoring \(n\) subscripts for simplicity)
\[e_{direct} = \frac{\partial P_i}{\partial X_{ik}} \frac{X_{ik}}{P_i}=(1-P_i)X_{ik}\beta_k\]
Logit - Cross Elasticity
- Cross elasticity of the probability of an individual \(n\) choosing alternative \(i\) wrt a change in an attribute/independent variable \(X_{jk}\) with coefficient \(\beta_k\) for a different alternative
\[e_{cross} = -P_j X_{jk}\beta_k\]
- Above is same regardless of alternative \(i\). I.e., all alternatives have the same cross-elasticity wrt an attribute \(k\) of alternative \(j\)
- Above results from independence of irrelevant alternatives (IIA) property of basic logit model
Independence of Irrelevant Alternatives (IIA) Property
Consider we have two modes: auto (a) & bus (b)
\[\frac{P_a}{P_b} = \frac{exp(V_a)}{exp(V_b)} = exp(V_a - V_b)\] I.e., the relative probability of choosing auto vs. bus depends only on the utilities of the two alternatives
- Now what if our buses are red (rb) & we paint half blue (bb)?
IIA Contβ¦
- If the IIA assumption does not hold, we can make other error distribution assumptions
- Nested logit (can handle complex decision structures)
- Probit (normal error terms, very general model, but can be difficult to work with it)